15 research outputs found

    Detecting gross alignment errors in the Spoken British National Corpus

    Full text link
    The paper presents methods for evaluating the accuracy of alignments between transcriptions and audio recordings. The methods have been applied to the Spoken British National Corpus, which is an extensive and varied corpus of natural unscripted speech. Early results show good agreement with human ratings of alignment accuracy. The methods also provide an indication of the location of likely alignment problems; this should allow efficient manual examination of large corpora. Automatic checking of such alignments is crucial when analysing any very large corpus, since even the best current speech alignment systems will occasionally make serious errors. The methods described here use a hybrid approach based on statistics of the speech signal itself, statistics of the labels being evaluated, and statistics linking the two.Comment: Four pages, 3 figures. Presented at "New Tools and Methods for Very-Large-Scale Phonetics Research", University of Pennsylvania, January 28-31, 201

    Developing a large scale population screening tool for the assessment of Parkinson's disease using telephone-quality voice

    Get PDF
    Recent studies have demonstrated that analysis of laboratory-quality voice recordings can be used to accurately differentiate people diagnosed with Parkinson's disease (PD) from healthy controls (HC). These findings could help facilitate the development of remote screening and monitoring tools for PD. In this study, we analyzed 2759 telephone-quality voice recordings from 1483 PD and 15321 recordings from 8300 HC participants. To account for variations in phonetic backgrounds, we acquired data from seven countries. We developed a statistical framework for analyzing voice, whereby we computed 307 dysphonia measures that quantify different properties of voice impairment, such as, breathiness, roughness, monopitch, hoarse voice quality, and exaggerated vocal tremor. We used feature selection algorithms to identify robust parsimonious feature subsets, which were used in combination with a Random Forests (RF) classifier to accurately distinguish PD from HC. The best 10-fold cross-validation performance was obtained using Gram-Schmidt Orthogonalization (GSO) and RF, leading to mean sensitivity of 64.90% (standard deviation, SD 2.90%) and mean specificity of 67.96% (SD 2.90%). This large-scale study is a step forward towards assessing the development of a reliable, cost-effective and practical clinical decision support tool for screening the population at large for PD using telephone-quality voice.Comment: 43 pages, 5 figures, 6 table

    The effects of delayed auditory and visual feedback on speech production

    Get PDF
    Monitoring the sensory consequences of articulatory movements supports speaking. For example, delaying auditory feedback of a speaker's voice disrupts speech production. Also, there is evidence that this disruption may be decreased by immediate visual feedback, i.e., seeing one's own articulatory movements. It is, however, unknown whether delayed visual feedback affects speech production in fluent speakers. Here, the effects of delayed auditory and visual feedback on speech fluency (i.e., speech rate and errors), vocal control (i.e., intensity and pitch), and speech rhythm were investigated. Participants received delayed (by 200 ms) or immediate auditory feedback, while repeating sentences. Moreover, they received either no visual feedback, immediate visual feedback, or delayed visual feedback (by 200, 400, and 600 ms). Delayed auditory feedback affected fluency, vocal control, and rhythm. Immediate visual feedback had no effect on any of the speech measures when it was combined with delayed auditory feedback. Delayed visual feedback did, however, affect speech fluency when it was combined with delayed auditory feedback. In sum, the findings show that delayed auditory feedback disrupts fluency, vocal control, and rhythm and that delayed visual feedback can strengthen the disruptive effect of delayed auditory feedback on fluency

    Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders

    No full text
    Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders provides a survey of methods designed to aid clinicians in the diagnosis and monitoring of speech disorders such as dysarthria and dyspraxia, with an emphasis on the signal processing techniques, statistical validity of the results presented in the literature, and the appropriateness of methods that do not require specialized equipment, rigorously controlled recording procedures or highly skilled personnel to interpret results. Such techniques offer the promise of a simple and cost-effective, yet objective, assessment of a range of medical conditions, which would be of great value to clinicians. The ideal scenario would begin with the collection of examples of the clients’ speech, either over the phone or using portable recording devices operated by non-specialist nursing staff. The recordings could then be analyzed initially to aid diagnosis of conditions, and subsequently to monitor the clients’ progress and response to treatment. The automation of this process would allow more frequent and regular assessments to be performed, as well as providing greater objectivity

    A phonologically calibrated acoustic dissimilarity measure

    No full text
    One of the most basic comparisons between objects is to ask how similar they are. Linguistics and phonology are founded on this question. The classic definitions of phonemes and features involve contrast between minimal pairs. A minimal pair of words requires that there be two sounds that are dissimilar enough for the words to be considered different. Otherwise we wouldn't speak of a minimal pair of words but rather of a single word with two meanings. Likewise, phonetic similarity is needed to group together separate instances into a single word or sound. Without some intuition about which sounds are so similar that they should be treated as instances of the same linguistic object, the field would be no more than a collection of trillions of disconnected examples. So, it is important to have a measure of dissimilarity between sounds. Some exist already: e.g. measures of dissimilarity between the input and output of speech codecs have been used as a way of quantifying their performance (e.g. Gray, Gray, and Masuyama 1980). Cepstral distance has been frequently used, and the Itakura-Saito divergence is also widely used. But, none of these have been explicitly calibrated against the differences in speech that are important to human language. In this paper, we do so

    Precision of Phoneme Boundaries Derived using Hidden Markov Models

    No full text
    Some phoneme boundaries correspond to abrupt changes in the acoustic signal. Others are less clear-cut because the transition from one phoneme to the next is gradual. This paper compares the phoneme boundaries identified by a large number of different alignment systems, using different signal representations and Hidden Markov Model structures. The variability of the different boundaries is analysed statistically, with the boundaries grouped in terms of the broad phonetic classes of the respective phonemes. The mutual consistency between the boundaries from the various systems is analysed to identify which classes of phoneme boundary can be identified reliably by an automatic labelling system, and which are ill-defined and ambiguous. The results presented here provide a starting point for future development of techniques for objective comparisons between systems without giving undue weight to variations in those phoneme boundaries which are inherently ambiguous. Such techniques should improve the efficiency with which new alignment and HMM training algorithms can be developed. 1
    corecore